83 research outputs found

    Evaluating Gender Bias in Speech Translation

    Get PDF
    The scientific community is increasingly aware of the necessity to embrace pluralism and consistently represent major and minor social groups. Currently, there are no standard evaluation techniques for different types of biases. Accordingly, there is an urgent need to provide evaluation sets and protocols to measure existing biases in our automatic systems. Evaluating the biases should be an essential step towards mitigating them in the systems. This paper introduces WinoST, a new freely available challenge set for evaluating gender bias in speech translation. WinoST is the speech version of WinoMT which is a MT challenge set and both follow an evaluation protocol to measure gender accuracy. Using a state-of-the-art end-to-end speech translation system, we report the gender bias evaluation on four language pairs and we show that gender accuracy in speech translation is more than 23% lower than in MT.Comment: Preprin

    Sign Language Translation from Instructional Videos

    Full text link
    The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.Comment: Paper accepted at WiCV @CVPR2

    On the Locality of Attention in Direct Speech Translation

    Get PDF
    Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contributions in the self-attention of the encoder, unveiling local diagonal patterns. To prove that some attention weights are avoidable, we propose to substitute the standard self-attention with a local efficient one, setting the amount of context used based on the results of the analysis. With this approach, our model matches the baseline performance, and improves the efficiency by skipping the computation of those weights that standard attention discards.Comment: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier Ferrand

    SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

    Get PDF
    Speech translation models are unable to directly process long audios, like TED talks, which have to be split into shorter segments. Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time. To bridge the gap between the manual segmentation of training and the automatic one at inference, we propose Supervised Hybrid Audio Segmentation (SHAS), a method that can effectively learn the optimal segmentation from any manually segmented speech corpus. First, we train a classifier to identify the included frames in a segmentation, using speech representations from a pre-trained wav2vec 2.0. The optimal splitting points are then found by a probabilistic Divide-and-Conquer algorithm that progressively splits at the frame of lowest probability until all segments are below a pre-specified length. Experiments on MuST-C and mTEDx show that the translation of the segments produced by our method approaches the quality of the manual segmentation on 5 languages pairs. Namely, SHAS retains 95-98% of the manual segmentation's BLEU score, compared to the 87-93% of the best existing methods. Our method is additionally generalizable to different domains and achieves high zero-shot performance in unseen languages.Comment: Submitted to Interspeech 2022, 5 pages. Previous version (v1) has additionally a 2-page Appendi

    Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

    Full text link
    This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model, thus maximizing transfer learning from MT. After this pretraining, we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge Distillation. Apart from the available ST corpora, we create synthetic data with SegAugment to better adapt our models to the custom segmentations of the IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released IWSLT.ACLdev2023.Comment: IWSLT 202

    Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

    Get PDF
    In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks input tokens' attributions for both contexts. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.Comment: EMNLP 202

    Efficient Speech Translation with Dynamic Latent Perceivers

    Full text link
    Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of Transformer baselines across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed with various computational budgets, without significant drops in translation quality.Comment: ICASSP 202

    Molecular diagnostics for Chagas disease: up to date and novel methodologies

    Full text link
    Chagas disease is caused by the parasite Trypanosoma cruzi. It affects 7 million people, mainly in Latin America. Diagnosis is usually made serologically, but at some clinical scenarios serology cannot be used. Then, molecular detection is required for early detection of congenital transmission, treatment response follow up, and diagnosis of immune-suppression reactivation. However, present tests are technically demanding and require well-equipped laboratories which make them unfeasible in low-resources endemic regions

    DNA-origami-aided lithography for sub-10 nanometer pattern printing

    Get PDF
    We report the first DNA-based origami technique that can print addressable patterns on surfaces with sub-10 nm resolution. Specifically, we have used a two-dimensional DNA origami as a template (DNA origami stamp) to transfer DNA with pre-programmed patterns (DNA ink) on gold surfaces. The DNA ink is composed of thiol-modified staple strands incorporated at specific positions of the DNA origami stamp to create patterns upon thiol-gold bond formation on the surface (DNA ink). The DNA pattern formed is composed of unique oligonucleotide sequences, each of which is individually addressable. As a proof-of-concept, we created a linear pattern of oligonucleotide-modified gold nanoparticles complementary to the DNA ink pattern. We have developed an in silico model to identify key elements in the formation of our DNA origami-driven lithography and nanoparticle patterning as well as simulate more complex nanoparticle patterns on surfaces
    • …
    corecore